Constructing prediction models and analyzing factors in suicidal ideation using machine learning, focusing on the older population

Suicide among the older population is a significant public health concern in South Korea. As the older individuals have long considered suicide before committing suicide trials, it is important to analyze the suicidal ideation that precedes the suicide attempt for intervention. In this study, six machine learning algorithms were employed to construct a predictive model for suicidal thinking and identify key variables. A traditional logistic regression analysis was supplementarily conducted to test the robustness of the results of machine learning. All analyses were conducted using a hierarchical approach to compare the model fit of each model in both machine learning and logistic regression. Three models were established for analysis. In Model 1, socioeconomic, residential, and health behavioral factors were incorporated. Model 2 expanded upon Model 1 by integrating physical health status, and Model 3 further incorporated mental health conditions. The results indicated that the gradient boosting algorithm outperformed the other machine learning techniques. Furthermore, the household income quintile was the most important feature in Model 1, followed by subjective health status, oral health, and exercise ability in Model 2, and anxiety and depression in Model 3. These results correspond to those of the hierarchical logistic regression. Notably, economic and residential vulnerabilities are significant factors in the mental health of the older population with higher instances of suicidal thoughts. This hierarchical approach could reveal the potential target population for suicide interventions.


Introduction
Suicide is a serious global public health issue [1,2].South Korea, in particular, has shown the highest suicide rate among the Organization for Economic Co-operation and Development (OECD) countries for a decade [3,4].The suicide rate in South Korea in 2020 was 24.1, which was 2.23 times higher than the OECD standardized population of 11.0 per 100,000 [3,4].
Meanwhile, it is globally the number of suicide deaths is highest among middle-aged adults in high-income countries [2].South Korea also shows a similar suicide distribution.The highest number of suicides occurred among individuals in their 50s, and suicide rates per 100,000 people substantially increased from the age of 60 years (28.4%),reaching a peak of 62.6% in the 80s [5].
Suicide in the elderly population exhibits distinct characteristics compared to younger individuals [6].Younger generations often engage in impulsive suicide attempts, which result in lower completion rate [7].Contrarily, elderly individuals tend to carefully consider various circumstances and plan their suicide in detail [8], leading to higher completion rate [9][10][11].Hence, suicidal ideation has emerged as a potent predictor of suicide attempts, particularly among the elderly.Considering their mutable and time-consuming nature, targeting these ideations could be a pivotal focus of effective suicide prevention interventions.
Previous literature has concentrated on incidents of suicide attempts and completed suicides, and comparatively less on suicidal ideation [6,12].Even within the constraints of existing literature, the prevailing view in most studies is to treat suicidal ideation merely as a proxy for suicide attempts or completed suicide.However, recent studies have recognized the importance of suicidal thoughts as a crucial prevention indicator because of their significant role as predictors immediately preceding suicidal attempt [13].We argue that the significance of suicidal ideation transcends mere prediction; rather, it resides in its meaning.Many public health politicians regard suicide as a mental disease [14].Nevertheless, the distinguishing characteristic of suicide from conventional diseases lies in the incorporation of individuals' perspectives and assessments of their life and the social environment in which they are engaged.Therefore, it is necessary to interpret suicidal thoughts in the context of their lives and social environment rather than simply viewing them from the perspective of a mental illness that can be treated with hormonal control and medication.
A majority of related studies have relied on conventional statistical methodologies, including chi-square, multiple logistic regression, multi-level analysis, and similar approaches [6,9,12,13,[15][16][17].Furthermore, the variables employed in each research model are significantly different from one another, reflecting the diverse range of related articles distributed across various research domains, such as gerontology, nursing, social behavior, and public health [15][16][17].A comprehensive analysis, irrespective of the research domain, requires extensive data that encompasses diverse variables and ensures an ample sample size with representatives for the entire population.In addition, appropriate techniques are required to handle the large prepared datasets.A limitation of traditional statistical techniques is that the number of input variables is limited when conducting regression analysis.To overcome these problems, social science researchers have recently attempted machine learning for various subjects [18][19][20][21].
Machine learning algorithms are defined as a process of building computer systems that automatically learn from data, model fitting, and conduct trials in subset data using the fitted model [22].This algorithm learns data without limiting the number of covariates and suggests the best model.Evidence has been presented indicating that the machine model's explanatory power surpasses that of traditional statistical techniques, such as multiple regression analysis.It possesses the advantage of deriving 'feature importance' through sub-analysis, allowing the determination of variables that are most crucial in predicting the outcome variable.However, a shortcoming of machine learning is its inability to articulate the vectors of dependent variables according to the specific levels of the independent variables.Specifically, machine learning can identify marital status as the most important variable for predicting suicidal ideation.However, it does not distinguish whether married, unmarried, or divorced individuals are more likely to commit suicide.Therefore, this study enhances the robustness of machine learning modeling results by supplementing them with a chi-square test and multiple logistic regression analysis.
It is commonly accepted that suicide is determined not only by psychological factors [16], but also by socioeconomic status, physical health and health behavior [12].Furthermore, recent studies have emphasized on the macro-level factors surrounding each individual [23,24].Han and Lee [12] employed multilevel analyses, including individual, household, and administrative area variables.However, machine learning cannot distinguish the macro level from individual-set data.Therefore, we decided to include individuals and households at the mezzo level.Meanwhile, existing studies did not include sufficient residential factors, such as the type of house, home ownership, and residential location.Residential factors provide a rich interpretation of environmental factors, economic symbols, and social status.
The purpose of this study was to derive the most appropriate model and important features of suicidal ideation for the older population by using machine learning algorithms.Additionally, it sought to identify factors using traditional statistical techniques and compare the outcomes between the two methods.Furthermore, this study aimed to highlight the effects of socioeconomic factors, health behavior, and residential factors, which may be obscured by physical and mental health variables, using a hierarchical method.As psychological factors are strongly correlated with suicidal ideation, the significance of other factors diminishes when all the factors in a model are included simultaneously.This mechanism also works in machine learning algorithms.Therefore, we conducted hierarchical modeling using three separate models.

Study data
Data sources.This study utilized 2018 data from the Korea Health Panel (KHP), a dataset investigated and managed by the National Health Insurance Services (NHIS) and the Korea Institute for Health and Social Affairs.The KHP provides reliable and representative data due to several key characteristics.First, samples were collected using a two-stage stratified random cluster sampling method based on the nationwide Population and Housing Census.Additionally, trained investigators from the KHP conducted face-to-face interviews with the participants and utilized supplementary tools, such as a household ledger, health diary, or medical care receipts, to minimize information loss and recall bias.The total sample size of the KHP in 2018 was 17,008 individuals.After excluding non-responses and missing values, the remaining sample comprised 12,698 individuals.Among them, 7,170 were individuals aged 50 or older.For analysis of the machine learning model, the ratio of the presence or absence of suicidal thoughts was adjusted to 1:1 using random under sampling (RUS).This was done to prevent overfitting errors caused by imbalanced data, resulting in a reduced sample size of 340 participants.This study was permitted by the institutional review board of Yonsei University (approval number: 1041849-202310-SB-200-01).
Conceptual framework.The conceptual framework for this study is depicted in Fig 1.This framework suggests that various factors-including sociodemographic, health behaviors, residential characteristics, and physical and mental health-can predict the occurrence of suicidal ideation.A detailed explanation of these variables is provided in the following section.
Outcome variables.The dependent variable in this study was suicidal ideation, as defined by the question "Have you ever thought you wanted to die, in the past year?"If the response to the question was "Yes," it was coded as 1; otherwise, it was coded as 0.
Independent variables.The advantage of machine learning techniques is that there is no limit to the input variables, and all variables provided by the data source can be input; however, conclusions can only be drawn through the given data values.Machine learning does not comprehend the relationships between variables in a human interpretive sense; rather, it interprets these relationships solely through numerical patterns.Therefore, this study included variables from existing studies with theoretical backgrounds to provide logical evidence [6,9,12,13,17].Additionally, we adjusted the values of the variables for use in hierarchical multiple logistic regression (HMLR).Table 1 displays the computed codes of the variables, excluding the missing values.
Table 1 shows the independent variables included in this study, including sociodemographic factors, health behavior, residential, and physical and mental health factors.To address the impact of chronic diseases, we employed the Charlson Comorbidity Index (CCI) using traditional statistical techniques.Meanwhile, in our machine learning analyses, we utilized raw Korean version of the International Classification of Diseases (KCD) codes because there are no variable limitations and there is less need for manipulation of input variables.Health shock was defined as hospitalization for more than 7 days, according to the methods described by Jung et al. [25] and Nguyen et al. [26].

Analysis
Hierarchical modeling.This study involved the construction of Model 1, which combines socioeconomic, residential, and health behavior factors; Model 2, an extension of Model 1 with the inclusion of physical health factors; and Model 3, further augmented with the addition of mental health variables to Model 2. Hierarchical machine learning modeling and HMLR analysis were conducted to reveal the effects of socioeconomic and residential factors more efficiently, considering the significant hierarchical relations of psychological factors.Data preprocessing, descriptive statistics, and regression analyses were performed using Stata version 18. Machine learning modeling and visual score graphs were performed using Python (version 3.9.10).Hierarchical machine learning modeling.Machine-learning modeling is a method in which a machine autonomously learns from data and selects a model that optimally predicts outcomes.During the machine learning process, the predictive performance is assessed by iteratively adding and removing variables from the model.In this iterative process, the feature importance is calculated by evaluating whether a specific variable significantly contributes to reducing prediction errors when added or removed from the model [25].Machine-learning methods are broadly categorized into supervised learning, in which researchers specify variables, and unsupervised learning, in which only data are provided without predefined variables.Therefore, we hierarchically input variables into the model based on their influence, as identified in prior research.Models 1, 2, and 3 were constructed using the researcher's prior knowledge, and the performance of each model was evaluated at every stage.We refer to this as hierarchical machine learning modeling.Specifically, we developed several models to predict the dependent variable using logistic regression, gradient boosting, naive bayes, K-nearest neighbors (KNN), support vector machine, and deep neural network, and compared the performance of each model.The metrics of sensitivity, specificity, accuracy, and area under the receiver's operating characteristics curve (AUC) were used to demonstrate a comprehensive and precise evaluation.To prepare the data for machine-learning analysis, we performed a data split, allocating 75% of the total data as training data and reserving the remaining 25% as test data.Additionally, we implemented a RUS procedure on the training data at a 1:1 ratio to mitigate overfitting errors in the unbalanced data.Machine learning becomes inefficient when the distribution of the binary dependent variable values is highly imbalanced (26).Therefore, this study addresses this imbalance by employing RUS to align the levels of individuals with and without suicidal ideation in a training dataset.
The RUS process is illustrated in Fig 2 .The raw dataset comprises 7,170 older adults.Although we did not conduct machine learning with this dataset, it was divided into a training dataset of 5,377 individuals (5,207 without and 170 with suicidal ideation) and a test dataset of 1,793 individuals (1,739 without and 54 with suicidal ideation) if machine learning was applied without using RUS.After implementing the RUS, the number of individuals without suicidal ideation in the training dataset was balanced with the number of individuals with suicidal ideation, resulting in 340 individuals (170 with and 170 without suicidal ideation).The test dataset consisted of 1,793 individuals, with 1,739 without and 54 with suicidal ideation.
Hierarchical multiple logistic regression analysis.HMLR is a well-known model that goes one step further than the multiple regression model, which measures the relationship between various independent and binary dependent variables [27,28].The hierarchical regression model considers the systematic order and hierarchy in the process of the independent variables affecting the dependent variable.In this process, the researcher can check the adjusted coefficient whenever a new independent variable is added, better understand the relationships among the independent variables, and effectively comprehend the magnitude of the influence of a specific independent variable on the binary dependent variable [28].The most significant advantage of a hierarchical regression model is that it expands when new independent variables are discovered in the traditional model.Hierarchical regression can reveal the shortcomings of the conventional model and show how the model improves the estimation precision by adding new predictors.

Descriptive analyses
The Chi-square test results examining the relationship between the general characteristics of middle-aged and elderly individuals and each dependent variable are detailed in the S1 Appendix.In the overall sample, significant differences were observed in variables related to suicidal thoughts, excluding three variables: residential location (basement, rooftop, and above ground), region, and smoking status.After under sampling, significant differences persisted in all variables, except for house type, residential location, region, CCI, and smoking status.The interpretation and consideration of the aforementioned results are discussed in subsequent sections.

Results of the prediction model for suicidal ideation
The results of the analysis using the six machine learning algorithms for Models 1-3, predicting suicidal thoughts in middle-aged and older individuals, are presented in Table 2.All metric values of sensitivity, specificity, accuracy, and AUC increased from Model 1 to Model 3, and among the six machine learning algorithms, logistic regression and gradient boosting consistently showed the highest performance.When initially observed, the gradient boosting-and logistic regression-based prediction models in Model 1 showed fairly good performance, with an AUC of 0.6 or higher.However, the KNN-based prediction model in Model 1 showed high prediction accuracy in terms of sensitivity (0.847) and accuracy (0.83), but very low prediction accuracy in terms of specificity (0.296).This indicates that the model is relatively overfitted to accurately predict only positive predictions.As additional learning variables were added from Models 1 to 2 and Models 2 to 3, the accuracy performance tended to gradually increase.Finally, majority of the prediction models in Model 3 showed very good accuracies above 0.80 for all indices, sensitivity, specificity, accuracy, and AUC.

Feature importance of suicidal ideation
Figs 3-5 present the top 10 feature importance variables at each stage of the models calculated by gradient boosting, which exhibited the highest accuracy.In Model 1, the most prominent feature was the household income quintile, followed by the number of household members, care recipients, region, home ownership, type of house, employment status, education level, generation, and gender.In Model 2, the most influential variable was subjective health status, followed by oral health and hearing problems, number of household members, exercise ability, pain, type of house, region, home ownership, and daily living ability.In Model 3, anxiety or depression was the most important factor, followed by depression alone, physical and mental stress, psychiatric drug use, care recipients, worrying about the future, number of household members, oral problems, unmet basic needs, and frustrating events.

Results of hierarchical multiple regression analysis
From the results of the HMLR analysis, the pseudo-R 2 significantly improved from Models 1 to 3 (Table 3).Specifically, the pseudo-R 2 values were 0.06, 0.16, and 0.41 for Models 1, 2, and 3, respectively.In Model 1, a lower education level was associated with a higher likelihood of suicidal thoughts.Specifically, middle school education showed odds of 1.27 [0.74-2.19],and    In Model 2, which incorporated physical health condition factors, the odds ratios for many socioeconomic factors, including age, educational level, employment status, household income quintile, and health insurance status, were significantly reduced.However, most health behaviors and residential factors either remained unchanged or showed significant differences.Among the physical health variables, those reporting bad in subjective health status (3.28  ) become significant.Furthermore, majority of the mental health variables, including anxiety and depression, depression alone, physical and mental stress, psychiatric drugs use, worrying about the future, and frustrating events, showed a high odds ratio compared to the other previous factors and were statistically significant.Especially the depression alone (6.98 odds [4.80-10.15])and severe anxiety or depression (12.93 odds [4.14-40.34])presented extraordinarily high odds ratios.

Discussion
The results of the study indicated that the gradient boosting algorithm outperformed the other machine learning techniques.Furthermore, the household income quintile was the most important feature in Model 1, followed by subjective health status, oral health, and hearing problems in Model 2, and anxiety and depression in Model 3.These results correspond to those of the hierarchical logistic regression.The S1 Appendix displays the basic statistical results for the samples, illustrating the distribution of suicidal ideation across the independent variables.A comparison between the total sample and RUS offers insights into the RUS method, which equalizes the sample size for both those with and without suicidal thinking.Unlike propensity score matching, which adjusts the control group based on the treatment group, RUS randomly reduces the control group in the total sample to achieve proportionality.For example, the gender composition of individuals who reported no suicidal ideation in RUS closely resembled that of the total sample rather than aligning with those who reported affirmative responses in RUS.Had the study employed propensity score matching, none of the chi-squared results would have exhibited significance.Table 2 presents the test results for the six machine learning prediction models.Evaluation metrics, including sensitivity, specificity, accuracy, and AUC, was employed to assess the performance of these models.Each metric is usually categorized as follows: < 0.6 is classified as poor, 0.6-0.69 as fair, 0.7-0.79 as good, 0.8-0.89 as very good; and anything exceeding 0.9, excellent.
First, based on gradient boosting, which exhibited the highest prediction performance and the most crucial evaluation metric, AUC, Model 1 demonstrated the lowest prediction performance (ranked fair), while Model 3 presented the highest performance, ranking in the very good category.In the case of Model 1, although all accuracy performances tended to be lower than those of Models 2 and 3, they still had significant implications for health and social policy.
Whereas the physical and mental health conditions included in Models 2 and 3 are subjective elements that cannot be detected without individuals' self-reports, socioeconomic and residential information is observable from the outside.The fundamental problem with suicide is its unpredictability.Most people with suicidal thoughts do not share their minds with others because of the seriousness of the issue.In many cases, they suddenly commit suicide without any signs or expressions [29].This concealing tendency makes it difficult for the government to achieve its goal of suicide prevention.However, by using socioeconomic and residential information, organizations responsible for preventing suicides can focus on target groups and intervene collectively, thereby using resources efficiently.
Moreover, the high sensitivity and low specificity of Model 1 indicate that the machine effectively predicts and identifies individuals with suicidal ideation, but also frequently misclassifies individuals without suicidal thoughts to be having them.While some may perceive this as a limitation, from a positive standpoint, it underscores the potential of identifying individuals at high risk of suicide solely through socioeconomic and residential factors.The results with high sensitivity and low specificity are much better than those with low sensitivity and high specificity, considering the purpose of prediction, which is to prevent suicide.
After establishing the preceding prediction performance, we confirmed the feature importance of each hierarchical model using gradient boosting, which can prioritize health policies.First, gradient boosting captured the household income quintile as the most important feature of Model 1's variable importance, and it was found that variables related to economic factors, such as region, home ownership, type of house, and employment status, also had a significant impact on suicidal thoughts.We also discovered an interesting finding: the care recipient was extracted as an important feature.While most studies have proven that caregivers are significantly vulnerable to suicidal thinking [30,31], this study revealed an additional finding that care recipients are also more likely to consider suicide and that its importance is incomparable to that of other factors.Intuitively, it may seem that the recipients would have been satisfied when they received care, but this was not the case.
This can be interpreted as follows: first, suicidal ideation could be contagious among close relationships [32].Second, the physical and mental health of the care recipients had already deteriorated; therefore, they would have considered suicide.Third, they might have feelings of guilt toward their caregivers because they show unwanted shameful situations and cannot help but always need the hands of the caregivers.Finally, they felt sorry for their families due to the costs spent on their livelihoods and paid caregivers.Therefore, it is necessary to establish psychological education programs for both caregivers and recipients.
In Model 2, oral health problems, hearing health problems, the number of household members, and exercise ability were the major factors affecting feature importance.It is well known that physical health affects suicidal ideation [33,34]; thus, the highest priority seems to be the characteristics of middle-aged and older individuals.An interesting point is the oral health problems.A considerable number of studies have provided evidence that oral health is related to suicidal thoughts [35][36][37], but its importance has been underestimated compared to other variables, such as depression.This may be because oral health problems abandon the life-giving pleasure of enjoying good foods.Furthermore, we suggest another possible interpretation: people may unconsciously associate suicide with difficulty in eating because it is genetically linked to survival.
In Model 3, 'anxiety or depression' and 'depression only' stood out prominently in terms of feature importance.Unfortunately, KHP investigated 'anxiety or depression' not 'anxiety only.'Although both variables measure depression, the 'anxiety or depression' variable includes an additional element of anxiety.Moreover, notable differences were observed in the values and distributions of the two variables.Therefore, we included both variables in the model.Anxiety and depression are closely related variables.Although depression has traditionally been considered a trigger and key factor in suicidal ideation, anxiety has received relatively less attention [12,16,38,39].Therefore, anxiety should be considered a significant variable.
The results of the HMLR analysis (Table 3) were similar to those of the hierarchical stepwise feature importance analysis.The regression coefficient values and significance levels of the variables that appeared to be important for the feature importance of each model were the highest.However, while machine learning cannot articulate the vectors of dependent variables, the HMLR analysis can.
In Model 1 of the HMLR, individuals below the median income level, medical aid recipients, tenants in multifamily houses, and those residing in commercial buildings, including all types of tenants (lump-sum deposit and monthly rent tenants), were more likely to experience suicidal ideation.However, in Model 3, in which physical and mental health variables were added, most of the socioeconomic factors became insignificant.This seems to be because income-and residence-related variables are correlated with physical and mental health factors.
Interpreting the results of HMLR with specific reference to previous research, we first address the gender aspect.There exists a well-known paradox regarding suicide rates: although the male suicide rate is typically higher than that of females, females are more likely to consider suicide [40].Existing studies have attributed this paradox to natural gender differences, suggesting that males tend to be impulsive with high execution abilities, whereas females are more sensitive and prone to overthinking [41,42].In model 1 of this study, females displayed 1.23 times higher odds of experiencing suicidal thoughts compared to males, although this disparity was not statistically significant.Moreover, in the model 3, the odds decreased to 0.9.We cautiously suggest that a confounding factor may influence the relationship between gender characteristics and suicidal ideation.Additionally, it's noteworthy that findings indicating higher rates of suicidal thoughts among women compared to men are not consistently observed.For example, Han and Lee [12] found no significant difference between genders.
Han and Lee [12] utilized a similar model to this study in that it employed a hierarchical approach and included comparable socioeconomic and health behavior variables.They reported insignificance for gender, age, educational level, employment status, marital status, smoking, and drinking, which aligns with the findings of this research.However, other studies reported varying results.For instance, Pompili et al. [43] discovered that highly educated individuals may resort to suicide following experiences of failure or public shame, whereas YEN [44] reported that the lower the level of education, the more suicidal thoughts occur.Qingsong et al. [45] found that employment status significantly influenced suicidal thoughts in the USA and Europe but not in Taiwan.Meanwhile, Yeong et al. [46] reported that low socioeconomic status heightens the risk of suicidal ideation among the elderly.Additionally, regarding marital status, Erwin et al. [47] discovered that having a partner had no significant effect, aligning with the results of HMLR, and additionally revealed that experiencing the loss of a partner had an impact.
On the other hand, both the results of feature importance and HMLR in this study corroborate findings from previous research indicating that mental health is the most significant factor in predicting suicidal thoughts.A wealth of literature has repeatedly confirmed similar results to those of this study [68][69][70][71][72][73][74][75][76][77][78].All mental health variables included in the study models exerted a significant influence on suicidal ideation, except for unmet basic needs.The effects of these mental health variables can be found elsewhere, such as, depression [68][69][70], anxiety [71,72], psychiatric drugs [73], physical mental stress [74,75], frustrating events [76,77], and unmet basic needs [78].
Although this study's findings on factors contributing to suicidal ideation are substantial, the main finding is residential factors.It is well known that residential elements, including house type, influence overall mental health [79].First, the deteriorated physical environments of majority of commercial buildings, such as poor hygiene, small rooms that cause a sense of confinement, and dirty building walls, yield unpleasant feelings [79].Second, due to architectural design, substantial noise stress exists in collective forms of housing, including multifamily houses and commercial buildings.In South Korea, the prevalence of noise-induced stress and interpersonal conflict among residents has become a conspicuous societal concern.The severity of these noise-related disputes has, in some instances, reached a critical threshold, resulting in instances of neighboring homicides [80].Third, the compact dimensions of commercial building rooms render residents unsuitable for hosting visits by acquaintances, relatives, or separated family members, thereby amplifying the likelihood of social isolation.However, it is not feasible for the government to provide new housing or refurbish existing accommodations.The optimal course of action for the government involves broadening the living environment to allow social activities.Extending living areas, communal interaction, and readily available counseling and educational resources are needed, which can be provided by institutions or organizations, such as senior centers, nursing homes, or public clubs.Policy frameworks should not be exclusively confined to the public sector; there is a pressing need to incentivize initiatives within the private sector, including religious institutions and community organizations.
Residential elements represent socioeconomic aspects.First, home ownership represents financial stability and a proactive lifestyle.In the absence of home ownership, residing in another person's dwelling may lead to relinquishing autonomy over one's life and living space to others, potentially causing psychological distress and melancholy.For several years, housing crises related to skyrocketing real estate prices have been pervasive in South Korea [81,82].Those who could not buy houses in this period had regrets and sadness because the increased price of the house made them believe that they could not own houses forever, even if they worked for several decades.Majority of these people were rental residents [82,83].
Specifically, middle-aged and older individuals face uncertainty regarding the duration of their employment and contend with income instability, thereby necessitating constant concern about the prospect of eviction if they are unable to meet rental obligations.Fortunately, the Right to Renew House Lease Contracts, which guarantee a residence period of at least 2 years, was established by a legal amendment of the Housing Lease Protection Act 2020 in Korea [83].Therefore, a follow-up study is needed to evaluate suicidal ideation among tenants, even after the establishment of the law.
This research makes a contribution to the existing literature by adopting a novel approach -machine learning algorithms.It offers a comprehensive explanation by comparing the outcomes of machine learning with those derived from conventional logistic regression.Furthermore, the study brings attention to a previously overlooked aspect: individuals who are care recipients and have low income, residing in vulnerable residential conditions, are susceptible to suicidal ideation.This finding addresses a gap in previous research.Despite these strengths, the study has notable limitations.The model 3 of HMLR includes an excessive number of variables, potentially leading to model complexity problem.While removing non-significant variables could be considered optimal, Table 3 is designed to showcase the hierarchical structure and facilitate comparison with the machine learning process.As a result, we present the naive results of the models without excluding any variables.

Conclusion
This study identified factors that could affect suicidal ideation in middle-aged and older people in as many categories as possible and identified priorities through a combination of machine learning techniques and traditional statistical techniques.As a result of the analysis, anxiety and depression were highest in the most expanded final model, though, physical health factors also played an important role.A hierarchical approach revealed that variables related to care recipients and residential areas caused significant suicidal thoughts.Although it is not possible to dramatically change people's socioeconomic levels and living patterns directly, culture and policies that can be integrated with the local community should be developed by understanding why people in this environment have suicidal ideation.About a few decades ago, there was a Korean proverb, 'A neighbor is better than a distant relative.'Unfortunately, with the prevalence of apartment-style residential complexes and the emergence of individualism during the rapid urban development in Korea, public interactions among neighbors have substantially diminished.It is crucial to revive both the phrase and the cultural values associated with it.